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Abstract 

In this paper, we describe an architecture for designing fuzzy con- 
trollers through a hierarchical process of control rule acquisition and 
by using special classes of neural network learning techniques. Hier- 
archical development of the fuzzy control rules is a useful technique 
which has been used earlier in designing a fuzzy controller with in- 
teractive goals [5]. Also, we introduce a new method for learning 
to refine a fuzzy logic controller. A reinforcement learning technique 
is used in conjunction with a multi-layer neural network model of a 
fuzzy controller. The model learns by updating its prediction of the 
plant’s behavior and is related to the Sutton’s Temporal Difference 
(TD) method. The method proposed here has the advantage of using 
the control knowledge of an experienced operator and fine-tuning it 
through the process of learning. The approach is applied to a cart-pole 
balancing system. 
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1 Introduction 

Fuzzy logic controllers have recently experienced a huge commercial success 
[12,6]. These controllers are usually developed based on the knowledge of 
human expert operators [4]. However, starting with the Self Organizing Con- 
trol (SOC) techniques of Mamdani and his students (e.g., [9]), the need for 
research in developing fuzzy logic controllers which can learn from experience 
has been realized (e.g., [8]). The learning task may include the identification 
of the main control parameters (i.e., related to the system identification in 
conventional and modern control theory) or development and fine-tuning of 
the fuzzy memberships used in the control rules. In this paper, we concen- 
trate on the latter learning task and develop a model which can learn to 
adjust the fuzzy memberships of the linguistic labels. 

The organization of this paper is as follows. We first discuss the general 
model of our NeuroFuzzy Controller (NFC) and then we apply this model to 
the control of a cart-pole balancing system. Finally, we compare this model 
with other related research works such as the credit assignment in artificial 
intelligence [10], Barto et. al.’s AHC model [3], and Lee and Berenji’s single 
layer model [8]. 


2 NFC: A Model for Intelligent Control 


Figure 1 illustrates the general model of our inteUigent controller. The two 
main elements in this model are the Action-state Evaluation Network (AEN), 
which acts as a critic and provides advice to the main controller, and the 
Action Selection Network (ASN) which includes a fuzzy controller. 

2.1 Action-state Evaluation Network (AEN) 

The only information received by the AEN is the state of the plant in terms 
of its state variables and whether a failure has occurred or not. Figure 2 
illustrates the structure of an evaluation network including m* hidden units 
and n input units from the environment (i.e., x 0 , x lv .., x„). The triangles 
represent the calculation- center [1] of the units where the updating equations 
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Figure 1: The NFC Model for Intelligent Control 
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r: failure signal 
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A Matrix of weights on the arcs connecting 
the Input layer to the hidden layer 
B: Matrix of weights on the arcs connecting 
the Input layer to the output layer 
C: Matrix of weights on the arcs connecting 
the hidden layer to the output layer 

Figure 2; The Evaluation Network 

(to be described bellow) are applied. The input from the environment is 
provided to all hidden units and output units while an interconnection weight 
exists at every intersection. Therefore in this network, hidden units receive 
n + 1 inputs and have n + 1 weights each while the output units receive 
n _|_ i 4 . mh inputs and have n + 1 + weights. If A, B , C are the matrices 
of connection weights, then the output of the evaluation network is: 






»= i 


«=1 


( 1 ) 


where n 

y*[^i > = 9(^2 

i= 1 


and 



( 3 ) 


In the above equations, double time dependencies are used to avoid in- 
stabilities in the updating of weights [2], This network evaluates the action 
recommended by the action network as a function of the failure signal and 
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D: Matrix of weights on the arcs connecting 
the Input layer to the hidden layer 
E: Matrix of weights on the arcs connecting 
the Input layer to the output layer 
F: Matrix of weights on the arcs connecting 
the hidden layer to the output layer 

Figure 3: The Action Selection Network 


the change in state evaluation: 


r[H 1] = | 


0 

r[*+l]-t>[M] 

r[t + 1] +7t>[M + !] 




The weights in this network are modified 


if state at time t + 1 is a start state; 
if state at time t+1 is a failure state; 
otherwise 

( 4 ) 

according to the followings: 


bi[t + 1 ] = bi[t] + fir[t + l]*i[t] ( 5 ) 

Ci{t + 1 ] = <*[*] + fir[t + %<[M] ( 6 ) 

o.ij\t + 1] = 4- t](l — y»[f > ^]) J 5 n ( c *[^]) a: j[^] (7) 

where 0 < 7 < 1 and (3,f3h > 0. 


2.2 Action Selection Network (ASN) 

The Action Selection Network (ASN) includes a fuzzy controller which con- 
sists of a fuzzifier, a rule base and decision making logic, and a defuzzifier all 
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represented in a network. The design of the rule base for this fuzzy controller 
follows the algorithm developed in [5] which is based on a hierarchical process 
considering the interaction of multiple goals. 

In this paper, the above fuzzy controller is modeled by a two layered 
neural network where the input layer includes the fuzzifier whose task is to 
match the values of the input variables against the labels used in the fuzzy 
control rules. The hidden layer in this network corresponds to the rules 
used in the controller and includes the decision making logic. The output 
layer includes the decoding (defuzzification) process. In the following, a brief 
explanation on fuzzy logic control is provided. However, for more detailed 
information, see [4]. The action selector is shown in Figure 3, where the 
matrices of connection weights are D, E, and F. The individual member 
of these matrices are labelled djj, e* , and /». In this network, the hidden 
nodes represent a fuzzy control rule in the following manner. The inputs to 
the node are the preconditions of a rule and the output of the node is its 
conclusion. We assume a Multi Input Single Output (MISO) control system. 
The output layer combines the conclusion of the individual rules by using 
the Center Of Area (COA) method [4] which is described below. Let w(i) 
represent the degree that rule i is satisfied by the input state variables in X 
which means 

tu(i) = Min{diiHu{Xi)y di2ll%2{ x 2)i dinpiniEn)} (®) 

where nn{xi) represents the degree of membership of the input x x in a fuzzy 
set representing the label used in the first precondition of the rule i and n 
is the number of inputs. Then m(i), which represents the result of applying 
the u>(t) on the conclusion of rule t, is calculated from 

w(i) = fid(m(i)) (9) 


where /xc, represents the monotonic membership function of the label used 
in the conclusion of rule t. The amount of the control action (i.e., u) is then 
calculated by using the Center Of Area (COA) method as the following. 
Assuming discretized membership functions, COA reveals 


(A _ Sfei fj x m (») x 

u{) " IS i «(0 x f* 


( 10 ) 


where m h is the number of nodes in the hidden layer which is equivalent to 
the number of rules used in the model. We define two more functions here: 
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( 11 ) 


*[<] = s(£*;[<KM) 

i=i 


»=1 


™h 

p[t] = $(£ + £ /<[*]*<[*]) 

»=i 


and 


. . J 1, with probability p[<]; 

i 0, with probability 1 — p[t] 


( 12 ) 


(13) 


The connection weights are updated according to the followings: 


e<[t + 1] = ei\t) + pr[t + l](q\t] - p[t])*i[t] (14) 

fi[t + 1] = fi[t ] + pr[t + 1 }(q[t) - p[t])zi[t] (15) 

dij[t + 1] = dij[t) + p h f[t + l]«»[t](l - Zi[t])sgn(fi[t])(q[t] - ?[<])*,[«] (16) 

where p and ph > 0. 


3 Applying NFC to Cart-Pole Balancing 

In this section, we describe the cart-pole balancing problem and apply the 
NFC model to its control. 

3.1 The Cart-Pole balancing problem 

In this system a pole is hinged to a motor-driven cart which moves on rail 
tracks to its right or its left. The pole has only one degree of freedom (rotation 
about the hinge point). The primary control tasks are to keep the pole 
vertically balanced and keep the cart within the rail tracks boundaries. 

Four state variables are used to describe the system status, and one vari- 
able represents the force applied to the cart. These are: 
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x : horizontal position of the cart on the rail 
x : velocity of the cart 

9 : angle of the pole with respect to the vertical line 

9 : angular velocity of pole 

u : force applied to the cart. 

We assume that a failure happens when | 9 |> 12 degrees or | x |> 2.4 
meters. Also, we assume that the equations of motion of the cart-pole system 
are not known to the controller and only a vector describing the cart-pole 
system’s state at each time step is known. In other words, the cart-pole 
balancing system is treated as a black box by the learning system. 

Figure 4 presents the model of NFC as it is applied to this problem. 
Among the components of this model, we only describe the Action Selection 
Network here. 


3.2 The Action Selection Network 

The action network was modeled by defining a multi-layered neural network 
which receives reinforcements from the evaluation network. This network, 
as shown in Figure 4, consists of 5 input nodes representing the four state 
variables and a bias unit, 13 nodes in the hidden layer, and an output node. 
The nodes in the hidden layer correspond to the fuzzy control rules. For 
example, node 1 corresponds to the rule: 

IF 9 is Positive and 9 is Positive Then Force is Positive-Large. 

As mentioned earlier, the rule base of a fuzzy controller consists of rules which 
are described using linguistic variables. As shown in Figure 5(a) and Figure 
5(b), three labels are used here to linguistically define the value of the state 
variables: Positive (P), Zero (Z), and Negative (N). Seven labels are used 
to linguistically define the value of force recommended by each control rule: 
Positive Large (PL), Positive Medium (PM), Positive Small (PS), Zero (ZE), 
Negative Small (NS), Negative Medium (NM), and Negative Large (NL). 
The forward calculations in this network is based on fuzzy logic control as 
described in [5], where nine fuzzy control rules were written for balancing 
the pole vertically and four control rules were used in positioning the cart at 
a specific location on the rail tracks. The presence of a connection between 
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Figure 4: NFC applied to cart pole balancing 
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Figure 5: (a)- Three qualitative levels for 9, 9 , z, and x, (b)- Seven qualitative 
levels for F 

an input node j and a node » in the hidden layer indicates that the linguistic 
value of the input corresponding to node * is used as a precondition in rule 
t. As shown in Figure 4, the first nine rules, corresponding to the hidden 
layer nodes 1 to 9, are rules with two preconditions (i.e., 9 , and 9). The rules 
10 through 13 indude four preconditions representing the linguistic values 
of 9, 9, x, and x. In this network, D represents the matrix of connection 
weights between the input layer and the hidden layer, and F represents a 
vector of connection weights between the hidden layer and the output node. 
The amount of force applied to the cart is calculated using the equations (8) 
to (10) as were given in the last section. 


4 Relation to other research 

Credit Assignment The evaluation network in our work is similar to the 
Samuel’s early work on credit assignment [10]. The Adaptive Heuristic Critic 
(AHC) model of Barto et. al. [3] provides a more general approach to credit 
assignment which learns by updating the predictions of failures. If no failure 
signal is present, the internal reinforcement provided by AHC is just the 
difference between the successive predictions of failure. Recently, Sutton [11] 
has formalized this method as the Temporal Difference methods. 
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Anderson’s .Multi-layer networks We use the same structure as pro- 
posed by Anderson [2], however, the action selection network in our model 
is based on fuzzy logic control. Using the structure of a fuzzy controller, 
Anderson’s approach is extended here to provide for the following attributes 
in NFC. 

• The continuous representation of the output value. 

• The inclusion of the human expert operator’s control rules in terms of 
hidden units in the action selection network. 

It should be noted that Anderson’s goal in [1] was to discover the interesting 
patterns and strategy learning schemes. Not much effort was spent on making 
the process learn faster. In our work, although we allow some of the strategy 
learning to happen automatically, we start from a knowledge base of fuzzy 
control rules and fine-tune them as learning happens in the neural network. 


Single Layer NeuroFuzzy Control Lee and Berenji [8] and Lee [7] have 
used a single layer neural network which requires the identification of the 
trace functions for keeping track of the visited states and their evaluations. 
The generation of these trace function is a difficult task in larger control 
problems. However, the approach suggested in the current paper does not 
use trace functions. The neural network representation of the fuzzy control 
rules in NFC allows faster development and faster learning. Also, in the 
single layer model, only the generation of the output values were considered. 
The preconditions of the fuzzy control rules were left untouched. However, 
in NFC, based on reinforcements received from the environment, both the 
preconditions and the conclusions of rules can be modified (i.e., fine-tuned). 


5 Conclusion 

A new model based on the reinforcement learning technique and fuzzy logic 
control was proposed which is applicable to control problems for which the 
analytical models of the process are unknown. The NFC model presented 
here improves the previous models in neurofuzzy control by learning to fine- 
tune the performance of a fuzzy logic controller. 
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An Architecture for Designing Fuzzy Controllers 
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Approximate Reasoning and Control 
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Fuzzy Sets and Rule-Based Control 
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Hierarchical Fuzzy Control 
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The cart-pole balancing problem 
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HTearchical Fuzzy Control 
(Example:cart-pole balancing) 

1. Goals: {position the cart at location x on the track, keep the pole balanced}. 
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THEN F is Positive-medium. 



Hybrid Model 
(Supervised Learning) 



21 


Making 

Logic 




Credit Assignment Problem 
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- Performance trace is crucially important in credit assignment 
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